Improving Read Performance with BP-DAGs for Storage-Efficient File Backup
نویسندگان
چکیده
The continued growth of data and high-continuity of application have raised a critical and mounting demand on storage-efficient and high-performance data protection. New technologies, especially the D2D (Disk-to-Disk) deduplication storage are therefore getting wide attention both in academic and industry in the recent years. Existing deduplication systems mainly rely on duplicate locality inside the backup workload to achieve high throughput but suffer from read performance degrading under conditions of poor duplicate locality. This paper presents the design and performance evaluation of a D2D-based de-duplication file backup system, which employs caching techniques to improve write throughput while encoding files as graphs called BP-DAGs (Bi-pointer-based Directed Acyclic Graphs). BP-DAGs not only satisfy the 'unique' chunk storing policy of de-duplication, but also help improve file read performance in case of poor duplicate locality workloads. Evaluation results show that the system can achieve comparable read performance than non de-duplication backup systems such as Bacula under representative workloads, and the metadata storage overhead for BP-DAGs are reasonably low.
منابع مشابه
A Lookahead Read Cache: Improving Read Performance of Deduplication Storage for Backup Applications
Abstract—Data deduplication (for short, dedupe) is a special data compression technique and has been widely adopted especially in backup storage systems with the primary aims of backup time saving as well as storage saving. Thus, most of the traditional dedupe research has focused more on the write performance improvement during the dedupe process while very little effort has been made at read ...
متن کاملTLFS: High Performance Tape Library File System for Data Backup and Archive
A tape library is seldom considered as a viable place for constructing a file system for a sequential write/read device. Storage virtualization technology has become a buzzword in technology circles lately, in this paper we propose a tape library file system, called TLFS. The purpose of TLFS is to maintain a consistent view of mass storage so that the user can effectively manage it. Like disk f...
متن کاملOffline Selective Data Deduplication for Primary Storage Systems
Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunat...
متن کاملUSENIX Association Proceedings of the FAST 2002 Conference on File and Storage Technologies Monterey
This paper describes a network storage system, called Venti, intended for archival data. In this system, a unique hash of a block’s contents acts as the block identifier for read and write operations. This approach enforces a write-once policy, preventing accidental or malicious destruction of data. In addition, duplicate copies of a block can be coalesced, reducing the consumption of storage a...
متن کاملAn SRP Target Mode to Improve Read Performance of SRP-Based IB-SANs
SCSI RDMA Protocol (SRP) is used to build high performance Storage Area Networks (SANs) over InfiniBand, or SRP-based IB-SANs for short. The I/O read performance is critical for many read dominant applications, such as multimedia, remote sensing, data backup, etc. However, if I/O accesses focus on a specific storage device of an IB-SAN, the local I/O performance of single device could become th...
متن کامل